AST-148815 : Add CLAUDE.md with architecture and contract notes#20
AST-148815 : Add CLAUDE.md with architecture and contract notes#20cx-anurag-dalke wants to merge 2 commits into
Conversation
Captures the Parser/factory dispatch model, per-ecosystem quirks, and the invariants (0-based line numbers, "latest" sentinel, PackageManager strings) that downstream AST-CLI relies on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Great job! No new security vulnerabilities introduced in this pull request |
cx-atish-jadhav
left a comment
There was a problem hiding this comment.
Thanks for getting this started — the content that's here is accurate and well-grounded in the code (verified the Parser interface signature, ParsersFactory, Package/Location structs, the 9e490aa commit reference, the 60% CI floor, and the Go 1.23 / toolchain 1.24.2 pins all match the repo).
Leaving inline comments for consistency with the JIRA epic template (AST-148815 parent). The epic lists these essential sections: Project Overview, Architecture, Repository Structure, Technology Stack, Development Setup, Coding Standards, Project Rules, Testing Strategy, Known Issues, External Integrations, Deployment, Performance, API/Interfaces, Security & Access, Logging, Debugging Steps. This PR covers Project Overview, Architecture, Project Rules (as "Invariants"), and Testing Strategy well — the inline comments point out the missing / thin sections plus a few small accuracy / clarity nits. Some sections (DB schema, deployment) are genuinely N/A for a parser library; recommend adding a one-liner saying so rather than omitting, so every repo's CLAUDE.md has a predictable shape.
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
Rename to ## Project Overview to match the JIRA epic template. The epic also asks for "Purpose and status" — please add a one-liner on status (active / maintained / experimental / deprecated).
|
|
||
| Go module that parses package manifests from multiple ecosystems (Maven, npm, Python, Go, .NET) and returns each declared dependency along with the **exact line/character range** of its declaration. Consumed by [AST-CLI](https://github.com/Checkmarx/ast-cli) to correlate manifest entries with Checkmarx runtime scans — so the `Locations` field is part of the public contract, not a debugging convenience. | ||
|
|
||
| ## Commands |
There was a problem hiding this comment.
The epic asks for a ## Development Setup section. Consider renaming this and adding: (a) prerequisites (Go ≥ 1.23, git), (b) a note that dependencies are vendored so no go mod download is required, (c) an example invocation against a fixture, e.g. go run ./cmd test/resources/pom.xml. Right now a new contributor has to guess what a valid input path looks like.
| go test -run TestName ./path/... # run a single test by name | ||
| go test ./... -coverprofile cover.out # CI gate: total coverage must be >= 60% | ||
| go build -o manifest-parser ./cmd # build CLI | ||
| go run ./cmd <manifest-file> # run CLI against a manifest |
There was a problem hiding this comment.
Consider showing a truncated sample output (a Package JSON snippet) so readers know what success looks like without opening cmd/main.go.
| go run ./cmd <manifest-file> # run CLI against a manifest | ||
| ``` | ||
|
|
||
| Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2). |
There was a problem hiding this comment.
Please promote this into a dedicated ## Technology Stack section (one of the epic's essential sections) listing: Go 1.23 / toolchain 1.24.2, github.com/stretchr/testify v1.8.4, golang.org/x/mod v0.24.0, stdlib encoding/xml + encoding/json. Explicitly state "no database" and "no web framework" so the N/A sections are unambiguous.
|
|
||
| Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2). | ||
|
|
||
| ## Architecture |
There was a problem hiding this comment.
Add a ## Repository Structure section (or subsection) with the top-level folder tree: cmd/, pkg/parser/, internal/parsers/{maven,npm,pypi,golang,dotnet}/, test/resources/, vendor/. The epic lists "Repository Structure — Folder organization" as its own essential section.
| Per-ecosystem parsers live under [internal/parsers/](internal/parsers/): | ||
| - `maven/` — parses `pom.xml` with `encoding/xml`, then re-scans the raw text to locate each `<dependency>` block line by line. Resolves `${property}` vars from `<properties>` and falls back to `<dependencyManagement>` for empty/ranged versions. Only **direct** `<dependencies>` are emitted (managed-only deps are intentionally skipped to avoid duplicates — see commit `9e490aa`). | ||
| - `npm/` — parses `package.json` plus, if present as a sibling file, `package-lock.json` (v1 and v2/v3 formats). Ranged specifiers (`^`, `~`, `*`, `>`, `<`) trigger a lookup in the lockfile; `isLockVersionGreater` compares part-by-part numerically to decide whether the lockfile version satisfies the spec. Without a lock match, ranged versions resolve to `"latest"`. | ||
| - `pypi/` — line-oriented scan of `requirements*.txt` / `packages*.txt`. **Only `package==version` is supported** — `pip freeze`, Poetry, and pip-tools output are explicitly out of scope (see README "Known Limitations"). Comments (`#`) and environment markers (`;`) are stripped. |
There was a problem hiding this comment.
The pypi ==-only limitation belongs in a dedicated ## Known Issues / Limitations section per the epic. Please consolidate there: pypi == only (no pip freeze / Poetry / pip-tools), npm ranged versions without a lockfile resolve to "latest", Maven managed-only deps not emitted, etc. Scatter-and-gather hurts discoverability.
| - `golang/` — uses `golang.org/x/mod/modfile` to parse `go.mod`, then uses the parser's line metadata to compute character offsets. | ||
| - `dotnet/` — three separate parsers sharing patterns: `csproj_parser.go` (`.csproj`), `directory_packages_props_parser.go` (central package management), `packages_config_parser.go` (legacy). Versions are read from either a `Version` attribute or a nested `<Version>` element; bracketed ranges become `"latest"`. | ||
|
|
||
| ### Invariants worth preserving |
There was a problem hiding this comment.
The epic names this section "Project Rules — Don'ts and constraints". Consider renaming (or using both: ### Project Rules (Invariants)) for cross-repo consistency — the standardization goal is about predictable headings across repos, not just content.
|
|
||
| ### Invariants worth preserving | ||
|
|
||
| - **`Location` uses 0-based line numbers** in most parsers (Maven, Go, npm, pypi use `lineNum - 1` or a 0-based counter). Downstream AST-CLI depends on this; don't "fix" it to 1-based without coordinating. |
There was a problem hiding this comment.
Two ambiguities in Location worth clarifying: are StartIndex / EndIndex 0-based or 1-based, and are they byte offsets or rune/character offsets? AST-CLI callers need to know to render the annotation correctly for non-ASCII manifests.
|
|
||
| ## Tests & fixtures | ||
|
|
||
| Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source. |
There was a problem hiding this comment.
Please add: (a) how to view coverage locally (go tool cover -html cover.out), (b) the expected pattern for a new parser (fixture under test/resources/ + *_test.go co-located with parser using testify), (c) any naming convention for fixtures. These answer "how do I add a test?" which is the epic's intent for this section.
|
|
||
| Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source. | ||
|
|
||
| CI ([.github/workflows/ci.yml](.github/workflows/ci.yml)) enforces a **60% total coverage floor** — adding an untested branch to an already-thin package can push the whole repo below the gate. |
There was a problem hiding this comment.
Missing sections from the epic template — please add, even as one-liners for the N/A ones, so every repo's CLAUDE.md has a predictable shape:
- External Integrations — consumed by AST-CLI;
Locationsfield +PackageManagerstrings (mvn/npm/pypi/go/nuget) are load-bearing downstream. - Deployment — N/A (library; consumed via
go get github.com/Checkmarx/manifest-parser). - Database Schema — N/A.
- Performance Considerations — Maven re-scans raw XML after
encoding/xmlparse (two passes); no streaming; largepom.xmlfiles load fully into memory. - Security & Access — parsers consume untrusted manifest files. Note XXE posture (
encoding/xmldoesn't resolve external entities by default — worth stating explicitly) and whether there's a file-size bound. - Logging — CLI uses
log.Fatalfon error; library returnserrorand does not log. Callers should not expect library log output. - Debugging Steps — how to run one parser against one fixture, how to set
-vfor verbose tests, common failure mode (location off-by-one → check 0-based invariant). - Coding Standards —
gofmt/go vetclean; exported identifiers inpkg/, internal logic ininternal/; parser packages follow<ecosystem>/<ecosystem>_parser.go+<ecosystem>_parser_test.golayout.
* feat: add Gradle parser feature branch * feat: update Gradle parser and add implementation plan * updated gradle-parser * added support for gradle libs.versions.toml * Add CLAUDE.md with architecture and contract notes Captures the Parser/factory dispatch model, per-ecosystem quirks, and the invariants (0-based line numbers, "latest" sentinel, PackageManager strings) that downstream AST-CLI relies on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * updated readme file * docs: expand CLAUDE.md to meet JIRA epic template requirements Addresses all inline review comments from PR #20 review: - Rename Overview → Project Overview, add status line - Add Technology Stack section (Go 1.23, testify, x/mod, stdlib, no DB) - Add Repository Structure section with folder tree - Rename Commands → Development Setup; add prerequisites, clone step, coverage HTML command, and sample JSON output - Add API / Interfaces section with full struct definitions including clarification that StartIndex/EndIndex are 0-based byte offsets - Update Architecture to include Gradle parser (missed in original) - Replace commit hash reference (9e490aa) with PR #15 link - Rename Invariants → Project Rules (Invariants); add PackageManager string for gradle and the StartIndex/EndIndex byte-offset clarification - Rename Tests & fixtures → Testing Strategy; add fixture tree, coverage HTML command, and expected new-parser pattern - Add Known Issues / Limitations section (consolidates pypi, npm, maven, dotnet, and all-parsers limitations) - Add External Integrations section (AST-CLI contract fields) - Add Deployment section (N/A — library, not a service) - Add Performance Considerations section (Maven two-pass, Gradle catalog, no caching) - Add Security & Access section (XXE posture, no file-size limit, no network calls) - Add Logging section (library vs CLI behaviour) - Add Coding Standards section (gofmt/vet, pkg vs internal, naming) - Add Debugging Steps section (5 concrete steps) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Add SBT (Scala Build Tool) manifest parser support Implement a production-grade SBT parser that extracts dependencies from all .sbt files (build.sbt, plugins.sbt, dependencies.sbt, etc.). The parser supports val/lazy val/def variable declarations, all SBT operators (%, %%, %%%), Seq blocks, addSbtPlugin syntax, dependency modifiers (exclude, excludeAll, intransitive, withSources, withJavadoc, cross, classifier), block and inline comments, scope annotations, dependencyOverrides, and duplicate package detection. Includes 29 unit tests at 97.8% coverage with test fixtures containing known-vulnerable packages (Log4Shell, Jackson, Struts2, commons-collections, SnakeYAML) for security scanning validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix gradle 0-based line contract and harden parsers for downstream IDE integration - gradle: emit 0-based line numbers (was off-by-one, broke IDE decorations) - gradle: default empty catalog versions to "latest" (was causing 400 from realtime-scanner) - gradle: multi-line dependency locations with rawLines tracking - maven/dotnet/golang: strip trailing \r so byte offsets are correct on CRLF files - CLAUDE.md: strengthen 0-based contract, add SBT to parser list - plugins.sbt: add known-vulnerable packages so IDE decorations can be visually verified Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge Python parsers and fix Windows CRLF EndIndex bug Integrated Poetry, Setuptools (setup.cfg/setup.py), and enhanced PyPI parsers from Sumit's implementation with existing Gradle/SBT support. All Python parsers return PackageManager="pypi" per design spec. Added testdata fixtures and comprehensive test coverage. Key changes: - Added internal/parsers/{poetry,setuptools}/ with full test suites - Enhanced internal/parsers/pypi/ with support for 6 Python formats - Fixed pre-existing CRLF line ending bug affecting golang, dotnet, maven on Windows - Updated manifest-file-selector.go with SBT, Poetry, and Setuptools routing - Updated parser_factory.go with dispatchers for all Python ecosystems - Updated CLAUDE.md with complete architecture and design pattern documentation All parsers pass unit tests. Manifest-parser builds and runs successfully. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * gradle and sbt wildcard support * adding sha for checkout actions * Using correct sha for cehckout actions in release.yml * log removal and minor fixes * Support Android flavor/buildtype dependency configurations in Gradle parser Fixes AST-160218. The Gradle parser previously matched only a hardcoded list of configuration keywords, missing all dynamically-generated Android variants: freeImplementation, paidDebugImplementation, debugApi, kaptTest, etc. These are auto-generated by the Android Gradle Plugin from product flavors and build types and contain real dependencies that need scanning. Changed configKeywords from exact-match to suffix-based regex covering: - *Implementation, *Api, *CompileOnly, *RuntimeOnly (any flavor/buildtype) - *Kapt, *Ksp, *AnnotationProcessor (test/androidTest scopes) - classpath, lintChecks (exact, no variants) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Changes test data location for pypi-parser tests * review comments implemented * replaced action with gh command --------- Co-authored-by: cx-anurag-dalke <120229307+cx-anurag-dalke@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a CLAUDE.md at the repo root documenting the Parser interface / ParsersFactory dispatch model and how to wire a new ecosystem.
Captures per-parser quirks (Maven multi-line Locations, npm lockfile fallback, pypi ==-only scope, dotnet variants).
Documents invariants downstream AST-CLI depends on: 0-based line numbers, literal "latest" for unresolvable versions, stable PackageManager strings (mvn/npm/pypi/go/nuget).
Notes the 60% CI coverage floor.